Does modeling lead to more accurate classification?: A study of relative efficiency in linear classification

نویسندگان

  • Yoonkyung Lee
  • Rui Wang
چکیده

Classification arises in a wide range of applications. A variety of statistical tools have been developed for learning classification rules from data. Understanding of their relative merits and comparisons help users to choose a proper method in practice. This paper focuses on theoretical comparison of model-based classification methods in statistics with algorithmic methods in machine learning in terms of the error rate. Extending Efron’s comparison of logistic regression with linear discriminant analysis (LDA) under the normal setting, we contrast such algorithmic methods as the support vector machine (SVM) and boosting with the LDA and logistic regression and study their relative efficiencies in reducing the error rate based on the limiting behavior of the classification boundary of each method. We show that algorithmic methods are generally less effective than model-based methods in the normal setting. In particular, loss of efficiency in error rate is typically about 33 to 60% for the SVM and 50 to 80% for boosting when compared to the LDA. However, a smooth variant of the SVM is shown to be even more efficient than logistic regression. In addition to the theoretical study, we present results from numerical experiments under various settings for comparisons of finite-sample performance and robustness to mislabeling and model misspecification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Providing a New Model to Improving DEA-based Models in Multi-criteria Inventory Classification (Case Study: Pars Khazar)

Abstract Objective: Many organizations use the ABC classification method to control their large amount of inventories. The most common way to classify inventories is the ABC method. In traditional ABC classification, items are only classified according to one criteria. But there are other criteria that need to be considered in the inventory classification. The purpose of this study is to prese...

متن کامل

Data Envelopment Analysis with Sensitive Analysis and Super-efficiency in Indian Banking Sector

Data envelopment analysis (DEA) is non-parametric linear programming (LP) based technique for estimating the relative efficiency of different decision making units (DMUs) assessing the homogeneous type of multiple-inputs and multiple-outputs. The procedure does not require a priori knowledge of weights, while the main concern of this non-parametric technique is to estimate the optimal weights o...

متن کامل

Multi-Group Classification Using Interval Linea rProgramming

  Among various statistical and data mining discriminant analysis proposed so far for group classification, linear programming discriminant analysis has recently attracted the researchers’ interest. This study evaluates multi-group discriminant linear programming (MDLP) for classification problems against well-known methods such as neural networks and support vector machine. MDLP is less compli...

متن کامل

Data Envelopment Analysis with LINGO Modeling for Technical Educational Group of an Organization

Data Envelopment Analysis (DEA) was developed to help compare the relative performance of decision-making units. It is a non-parametric method for performing frontier analysis. It uses linear programming to estimate the efficiency of multiple decision-making units and it is commonly used in production, management and economics [3]. DEA generates an efficiency score between 0 and 1 for each unit...

متن کامل

Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Multivariate Analysis

دوره 133  شماره 

صفحات  -

تاریخ انتشار 2015